Skip to content

[test]Evaluate model performance and accuracy with UCM#642

Merged
mag1c-h merged 3 commits intoModelEngine-Group:developfrom
ayaka836:test_model_validate
Jan 23, 2026
Merged

[test]Evaluate model performance and accuracy with UCM#642
mag1c-h merged 3 commits intoModelEngine-Group:developfrom
ayaka836:test_model_validate

Conversation

@ayaka836
Copy link
Copy Markdown
Contributor

@ayaka836 ayaka836 commented Jan 13, 2026

Purpose

This PR introduces a comprehensive model validation test suite to evaluate both performance (latency metrics) and accuracy (F1-score) under the following three key UCM caching scenarios:

Naive: No cache hits (hit rate = 0%) — serves as the baseline with full recomputation.
Sparse: Evaluated at hit rate = 0% to assess the performance and accuracy gains enabled by sparsity-aware mechanisms.
Prefix Caching (PC): Evaluated across multiple hit rates [0%, 30%, 50%, 80%, 100%] to demonstrate the impact of prefix reuse on inference latency.

Modifications

Added test cases to verify whether the model is compatible with PC and sparsification.

Test

==============================================================================================================
Hit Rate (%) Input Tokens    Output Tokens   Concurrency  TTFT_mean [s]        TPOT_mean [s]        E2E_mean [s]        
--------------------------------------------------------------------------------------------------------------
0            8000            200             8            2.9202               0.0300               8.9285              
30           8000            200             8            2.8951               0.0297               8.8281              
50           8000            200             8            2.8731               0.0300               8.8691              
80           8000            200             8            2.9001               0.0299               8.8861              
100          8000            200             8            2.8534               0.0305               8.9540              
==============================================================================================================

========================================
Test            f1-score
----------------------------------------
PC              0.0229      
========================================

@ayaka836 ayaka836 force-pushed the test_model_validate branch 2 times, most recently from 1f9b201 to 75d09dd Compare January 19, 2026 01:52
@Wwwzff Wwwzff force-pushed the test_model_validate branch from 75d09dd to f9de3fd Compare January 19, 2026 02:08
@ayaka836 ayaka836 force-pushed the test_model_validate branch from f9de3fd to 7472627 Compare January 19, 2026 02:25
Comment thread test/suites/E2E/test_model_validate.py
@ayaka836 ayaka836 force-pushed the test_model_validate branch 2 times, most recently from 103abf5 to c8de951 Compare January 19, 2026 08:23
Wwwzff
Wwwzff previously approved these changes Jan 19, 2026
Copy link
Copy Markdown

@Wwwzff Wwwzff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mag1c-h mag1c-h merged commit 720b122 into ModelEngine-Group:develop Jan 23, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants